Recent Advances in Continuous Speech Recognition Using the Time-Sliced Paradigm
نویسندگان
چکیده
We developed a method called Time-Slicing [1] for the analysis of the speech signal. It enables a neural network to recognize connected speech as it comes, without having to fit the input signal into a fixed time-format, nor label or segment it phoneme by phoneme. The neural network produces an immediate hypothesis of the recognized phoneme and its size is small enough to run even on a PC. To implement the time-slicing paradigm two different neural network architectures were tried: the TimeSliced Recurrent Recognizer (TSRR)[2] using an Elman style recurrent network, and the Time-Sliced Recurrent Cascade Correlation Network (TS-RCCN)[3] using the Recurrent Cascade-Correlation Architecture (RCC) [4]. The RCC is a powerful feature-learner, and manipulating its original structure some improvements were gained in its ability to generalize and recognize phonemes. This new structure was called the Parallel-RCC [5]. This paper presents the latest recognition results obtained with the Parallel-RCC and describes some of the attempts to further analyze the networks output with an errorrecovery method to obtain the final result. Recent Advances in Continuous Speech Recognition Using the Time-Sliced Paradigm Ingrid Kirschning, Hideto Tomabechi and Jun-Ichi Aoe Dept. of Information Science and Intelligent Systems Faculty of Engineering, University of Tokushima, 2-1 Minami Josanjima-Cho, Tokushima Shi, 770 Japan. E-mail : [email protected] Abstract : We developed a method called Time-Slicing [1] for the analysis of the speech signal. It enables a neural network to recognize connected speech as it comes, without having to fit the input signal into a fixed time-format, nor label or segment it phoneme by phoneme. The neural network produces an immediate hypothesis of the recognized phoneme and its size is small enough to run even on a PC. To implement the time-slicing paradigm two different neural network architectures were tried: the Time-Sliced Recurrent Recognizer (TSRR)[2] using an Elman style recurrent network, and the Time-Sliced Recurrent Cascade Correlation Network (TSRCCN)[3] using the Recurrent Cascade-Correlation Architecture (RCC) [4]. The RCC is a powerful feature-learner, and manipulating its original structure some improvements were gained in its ability to generalize and recognize phonemes. This new structure was called the Parallel-RCC [5]. This paper presents the latest recognition results obtained with the Parallel-RCC and describes some of the attempts to further analyze the networks output with an error-recovery method to obtain the final result. We developed a method called Time-Slicing [1] for the analysis of the speech signal. It enables a neural network to recognize connected speech as it comes, without having to fit the input signal into a fixed time-format, nor label or segment it phoneme by phoneme. The neural network produces an immediate hypothesis of the recognized phoneme and its size is small enough to run even on a PC. To implement the time-slicing paradigm two different neural network architectures were tried: the Time-Sliced Recurrent Recognizer (TSRR)[2] using an Elman style recurrent network, and the Time-Sliced Recurrent Cascade Correlation Network (TSRCCN)[3] using the Recurrent Cascade-Correlation Architecture (RCC) [4]. The RCC is a powerful feature-learner, and manipulating its original structure some improvements were gained in its ability to generalize and recognize phonemes. This new structure was called the Parallel-RCC [5]. This paper presents the latest recognition results obtained with the Parallel-RCC and describes some of the attempts to further analyze the networks output with an error-recovery method to obtain the final result.
منابع مشابه
The Time-Sliced Paradigm - A Connectionist Method for Continous Speech Recognition
In this paper a new method, called the Time-Slicing Paradigm, for the recognition of temporal patterns using neural networks is presented. This is a method for the analysis of the speech signal with the aim to achieve the recognition of connected speech with less preprocessing of the input signal than other existing neural networks. Along with the TimeSlicing Paradigm, this work also introduces...
متن کاملNeural Networks and the Time-Sliced Paradigm for Speech Recognition
The Time-Slicing paradigm is a newly developed method for the training of neural networks for speech recognition. The neural net is trained to spot the syllables in a continuous stream of speech. It generates a transcription of the utterance, be it a word, a phrase, etc. Combined with a simple error recovery method the desired units (words or phrases) can be retrieved. This paradigm uses a recu...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملبهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگیهای استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز
The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...
متن کامل